Extracting noun phrases for all of MEDLINE

نویسندگان

  • Nuala A. Bennett
  • Qin He
  • Kevin Powell
  • Bruce R. Schatz
چکیده

A natural language parser that could extract noun phrases for all medical texts would be of great utility in analyzing content for information retrieval. We discuss the extraction of noun phrases from MEDLINE, using a general parser not tuned specifically for any medical domain. The noun phrase extractor is made up of three modules: tokenization; part-of-speech tagging; noun phrase identification. Using our program, we extracted noun phrases from the entire MEDLINE collection, encompassing 9.3 million abstracts. Over 270 million noun phrases were generated, of which 45 million were unique. The quality of these phrases was evaluated by examining all phrases from a sample collection of abstracts. The precision and recall of the phrases from our general parser compared favorably with those from three other parsers we had previously evaluated. We are continuing to improve our parser and evaluate our claim that a generic parser can effectively extract all the different phrases across the entire medical literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of Articles in Learning English as a Foreign Language: A Study of Iranian English Undergraduates

The significance of error analysis for the learner, the teacher and the researcher is now widely recognized. Earlier studies of error analysis concentrated on intersystematic comparison of the “native language” and the “target language” and drew the required data largely from intuitions and impressionistic observations. This study was conducted on the basis of the following observations: (1) to...

متن کامل

Extracting Conceptual Terms from Medical Documents

Automated biomedical concept recognition is important for biomedical document retrieval and text mining research. In this paper, we describe a two-step concept extraction technique for documents in biomedical domain. Step one includes noun phrase extraction, which can automatically extract noun phrases from medical documents. Extracted noun phrases are used as concept term candidates which beco...

متن کامل

Extracting Noun Phrases in Subject and Object Roles for Exploring Text Semantics

In tune with the recent developments in the automatic retrieval of text semantics, this paper is an attempt to extract one of the most fundamental semantic units from natural language text. The context is intuitively extracted from typed dependency structures basically depicting dependency relations instead of Part-Of-Speech tagged representation of the text. The dependency relations imply deep...

متن کامل

Anaphora Resolution of Demonstrative Noun Phrases in Medline Abstracts

This paper reports our investigation of machine learning methods applied to anaphora resolution for Biology texts. Our primary concern is the investigation of features and their combinations for effective anaphora resolution. In this paper, we focus on the resolution of demonstrative anaphoric noun phrases. We propose several novel features that we call highlighting features and consider their ...

متن کامل

Concept Extraction in the Interspace Prototype

A comparison of four parsers was undertaken for noun phrase extraction − FastNPE, NPtool, Chopper, and AZ Phraser. FastNPE was found to be the fastest of the parsers, and NPtool the most accurate in extracting noun phrases. Both were subsequently implemented into the Concept Extractor module of the Interspace Prototype, which is described in detail.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings. AMIA Symposium

دوره   شماره 

صفحات  -

تاریخ انتشار 1999